Protein-2-BioBrick Sequence Generator
ReadMe File

Files included in this package:
* protein2bioBrick.jar The executable jar file that runs the program
* protein2bioBrick_readme.txt This very text file, containing description and instructions
* Kust_2859_2860_2861.txt These are three genes that form the multi-subunit Hydrazine Synthase. This is example data that can be used to test the program; they were used by the 2016 Kingsborough iGEM team.
* protein2bioBrick.java The source code, which can be modified and used as necessary as long as proper attribution to the original author is present.

Purpose:
 To go, with minimal effort, from a protein sequence to a DNA sequence that can be ordered from Integrated DNA Technologies (IDT) as a gBlock and used as a BioBrick.

Approach:
 This program takes FASTA formatted protein sequence(s) in a single file and reverse translates them to DNA, using codons preferred by E. coli. Furthermore, the program will eliminate any of the standard BioBrick restriction sites from the DNA sequence (i.e. EcoRI, XbaI, SpeI, and PstI), while preserving the encoded information. Finally, the sequence will be changed (again, while preserving the encoded information) so as to eliminate any repetitive blocks of 8 nucleotides; that is, any pattern of 8 nucleotides that occurs more than once in the sequence. This last step is performed to satisfy a requirement imposed by IDT during synthesis.

Instructions:
 Simply open the program, set the options, and open a FASTA file containing one or more protein sequences. The program will then run automatically and save the results. 

Options:
* You can select the desired BioBrick standard; this will change the restriction sites that will be avoided in your reverse translated sequence.

* Checking the "Add appropriate prefix and suffix" box will add the selected BioBrick standard prefix and suffix to your sequence

* Job ID# is a random number that indicates the name of the directory that will be created. You can change this number, or select "Generate New Job ID" to get a new random number. NOTE: If you run the program twice without restarting or getting a new Job ID#, the program will override the first run.

Output:
 Three files are created in a directory indicated by the Job ID#.

* protein2bioBrick_report.txt: This file includes runtime information and a log of changes. For example, the file will indicate whenever a motif is replaced, or if there is no instance of a particular motif. It will report the codon preference for the final sequence (to avoid restriction enzymes and repetitive sequences, the sequence is slightly deoptimized), and finally the amount of time in milliseconds that the program took to run.

* protein2bioBrick_results.txt: This file features the output DNA sequences, each with the original header from the protein FASTA sequence.

* protein2bioBrick_results.txt: This file features a translated version of the DNA sequence (ignoring prefix and suffix) to allow for a quality check / verification that the DNA sequence does encode the deisred protein sequence correctly.
